Context-Based Features Enhance Protein Secondary Structure Prediction Accuracy

نویسندگان

  • Ashraf Yaseen
  • Yaohang Li
چکیده

We report a new approach of using statistical context-based scores as encoded features to train neural networks to achieve secondary structure prediction accuracy improvement. The context-based scores are pseudo-potentials derived by evaluating statistical, high-order inter-residue interactions, which estimate the favorability of a residue adopting certain secondary structure conformation within its amino acid environment. Encoding these context-based scores as important training and prediction features provides a way to address a long-standing difficulty in neural network-based secondary structure predictions of taking interdependency among secondary structures of neighboring residues into account. Our computational results have shown that the context-based scores are effective features to enhance the prediction accuracy of secondary structure predictions. An overall 7-fold cross-validated Q3 accuracy of 82.74% and Segment Overlap Accuracy (SOV) accuracy of 86.25% are achieved on a set of more than 7987 protein chains with, at most, 25% sequence identity. The Q3 prediction accuracy on benchmarks of CB513, Manesh215, Carugo338, as well as CASP9 protein chains is higher than popularly used secondary structure prediction servers, including Psipred, Profphd, Jpred, Porter (ab initio), and Netsurf. More significant improvement is observed in the SOV accuracy, where more than 4% enhancement is observed, compared to the server with the best SOV accuracy. A Q8 accuracy of >70% (71.5%) is also found in eight-state secondary structure prediction. The majority of the Q3 accuracy improvement is contributed from correctly identifying β-sheets and α-helices. When the context-based scores are incorporated, there are 15.5% more residues predicted with >90% confidence. These high-confidence predictions usually have a rather high accuracy (averagely ~95%). The three- and eight-state prediction servers (SCORPION) implementing our methods are available online.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Protein Secondary Structure Prediction: a Literature Review with Focus on Machine Learning Approaches

DNA sequence, containing all genetic traits is not a functional entity. Instead, it transfers to protein sequences by transcription and translation processes. This protein sequence takes on a 3D structure later, which is a functional unit and can manage biological interactions using the information encoded in DNA. Every life process one can figure is undertaken by proteins with specific functio...

متن کامل

In quest of an empirical potential for protein structure prediction.

Key to successful protein structure prediction is a potential that recognizes the native state from misfolded structures. Recent advances in empirical potentials based on known protein structures include improved reference states for assessing random interactions, sidechain-orientation-dependent pair potentials, potentials for describing secondary or supersecondary structural preferences and, m...

متن کامل

Predicting Protein Solvent Accessibility with Sequence, Evolutionary Information and Context-based Features

Solvent-accessible surface areas of residues in proteins are key factors in protein folding. Predicting solvent accessibility from protein sequences is significant for modeling the structural and functional characteristics of many proteins. In this work, we introduce an approach of enhancing solvent accessibility prediction accuracy. We derive pseudo-potentials, by considering high-orderinter-r...

متن کامل

Physicochemical Position-Dependent Properties in the Protein Secondary Structures

Background: Establishing theories for designing arbitrary protein structures is complicated and depends on understanding the principles for protein folding, which is affected by applied features. Computer algorithms can reach high precision and stability in computationally designing enzymes and binders by applying informative features obtained from natural structures. Methods: In this study, a ...

متن کامل

Enhancing Protein Fold Prediction Accuracy Using Evolutionary and Structural Features

Protein fold recognition (PFR) is considered as an important step towards the protein structure prediction problem. It also provides crucial information about the functionality of the proteins. Despite all the efforts that have been made during the past two decades, finding an accurate and fast computational approach to solve PFR still remains a challenging problem for bioinformatics and comput...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:
  • Journal of chemical information and modeling

دوره 54 3  شماره 

صفحات  -

تاریخ انتشار 2014